{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Google Firestore in Datastore mode\n",
    "\n",
    "> [Firestore in Datastore mode](https://cloud.google.com/datastore) is a serverless document-oriented database that scales to meet any demand. Extend your database application to build AI-powered experiences leveraging Datastore's Langchain integrations.\n",
    "\n",
    "This notebook goes over how to use [Firestore in Datastore mode](https://cloud.google.com/datastore) to [save, load and delete langchain documents](https://python.langchain.com/docs/modules/data_connection/document_loaders/) with `DatastoreLoader` and `DatastoreSaver`.\n",
    "\n",
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googleapis/langchain-google-datastore-python/blob/main/docs/document_loader.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Before You Begin\n",
    "\n",
    "To run this notebook, you will need to do the following:\n",
    "\n",
    "* [Create a Google Cloud Project](https://developers.google.com/workspace/guides/create-project)\n",
    "* [Create a Datastore database](https://cloud.google.com/datastore/docs/manage-databases)\n",
    "\n",
    "After confirmed access to database in the runtime environment of this notebook, filling the following values and run the cell before running example scripts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# @markdown Please specify a source for demo purpose.\n",
    "SOURCE = \"test\"  # @param {type:\"Query\"|\"CollectionGroup\"|\"DocumentReference\"|\"string\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🦜🔗 Library Installation\n",
    "\n",
    "The integration lives in its own `langchain-google-datastore` package, so we need to install it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%pip install -upgrade --quiet langchain-google-datastore"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # Automatically restart kernel after installs so that your environment can access the new packages\n",
    "# import IPython\n",
    "\n",
    "# app = IPython.Application.instance()\n",
    "# app.kernel.do_shutdown(True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ☁ Set Your Google Cloud Project\n",
    "Set your Google Cloud project so that you can leverage Google Cloud resources within this notebook.\n",
    "\n",
    "If you don't know your project ID, try the following:\n",
    "\n",
    "* Run `gcloud config list`.\n",
    "* Run `gcloud projects list`.\n",
    "* See the support page: [Locate the project ID](https://support.google.com/googleapi/answer/7014113)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# @markdown Please fill in the value below with your Google Cloud project ID and then run the cell.\n",
    "\n",
    "PROJECT_ID = \"my-project-id\"  # @param {type:\"string\"}\n",
    "\n",
    "# Set the project id\n",
    "!gcloud config set project {PROJECT_ID}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🔐 Authentication\n",
    "\n",
    "Authenticate to Google Cloud as the IAM user logged into this notebook in order to access your Google Cloud Project.\n",
    "\n",
    "- If you are using Colab to run this notebook, use the cell below and continue.\n",
    "- If you are using Vertex AI Workbench, check out the setup instructions [here](https://github.com/GoogleCloudPlatform/generative-ai/tree/main/setup-env)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.colab import auth\n",
    "\n",
    "auth.authenticate_user()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### API Enablement\n",
    "The `langchain-google-datastore` package requires that you [enable the Datastore API](https://console.cloud.google.com/flows/enableapi?apiid=datastore.googleapis.com) in your Google Cloud Project."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# enable Datastore API\n",
    "!gcloud services enable datastore.googleapis.com"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic Usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Save documents\n",
    "\n",
    "`DatastoreSaver` can store Documents into Datastore. By default it will try to extract the Document reference from the metadata\n",
    "\n",
    "Save langchain documents with `DatastoreSaver.upsert_documents(<documents>)`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_core.documents import Document\n",
    "from langchain_google_datastore import DatastoreSaver\n",
    "\n",
    "data = [Document(page_content=\"Hello, World!\")]\n",
    "saver = DatastoreSaver()\n",
    "saver.upsert_documents(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Save documents without reference\n",
    "\n",
    "If a collection is specified the documents will be stored with an auto generated id."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "saver = DatastoreSaver(\"Collection\")\n",
    "\n",
    "saver.upsert_documents(data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Save documents with other references"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "doc_ids = [\"AnotherCollection/doc_id\", \"foo/bar\"]\n",
    "saver = DatastoreSaver()\n",
    "\n",
    "saver.upsert_documents(documents=data, document_ids=doc_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load from Collection or SubCollection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Load langchain documents with `DatastoreLoader.load()` or `Datastore.lazy_load()`. `lazy_load` returns a generator that only queries database during the iteration. To initialize `DatastoreLoader` class you need to provide:\n",
    "\n",
    "1. `source` - An instance of a Query, CollectionGroup, DocumentReference or the single `\\`-delimited path to a Datastore collection`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_google_datastore import DatastoreLoader\n",
    "\n",
    "loader_collection = DatastoreLoader(\"Collection\")\n",
    "loader_subcollection = DatastoreLoader(\"Collection/doc/SubCollection\")\n",
    "\n",
    "\n",
    "data_collection = loader_collection.load()\n",
    "data_subcollection = loader_subcollection.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load a single Document"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud import datastore\n",
    "\n",
    "client = datastore.Client()\n",
    "doc_ref = client.collection(\"foo\").document(\"bar\")\n",
    "\n",
    "loader_document = DatastoreLoader(doc_ref)\n",
    "\n",
    "data = loader_document.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load from CollectionGroup or Query"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.cloud.datastore import CollectionGroup, FieldFilter, Query\n",
    "\n",
    "col_ref = client.collection(\"col_group\")\n",
    "collection_group = CollectionGroup(col_ref)\n",
    "\n",
    "loader_group = DatastoreLoader(collection_group)\n",
    "\n",
    "col_ref = client.collection(\"collection\")\n",
    "query = col_ref.where(filter=FieldFilter(\"region\", \"==\", \"west_coast\"))\n",
    "\n",
    "loader_query = DatastoreLoader(query)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Delete documents\n",
    "\n",
    "Delete a list of langchain documents from Datastore collection with `DatastoreSaver.delete_documents(<documents>)`.\n",
    "\n",
    "If document ids is provided, the Documents will be ignored."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "saver = DatastoreSaver()\n",
    "\n",
    "saver.delete_documents(data)\n",
    "\n",
    "# The Documents will be ignored and only the document ids will be used.\n",
    "saver.delete_documents(data, doc_ids)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Advanced Usage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Load documents with customize document page content & metadata\n",
    "\n",
    "The arguments of `page_content_fields` and `metadata_fields` will specify the Datastore Document fields to be written into LangChain Document `page_content` and `metadata`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "loader = DatastoreLoader(\n",
    "    source=\"foo/bar/subcol\",\n",
    "    page_content_fields=[\"data_field\"],\n",
    "    metadata_fields=[\"metadata_field\"],\n",
    ")\n",
    "\n",
    "data = loader.load()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Customize Page Content Format\n",
    "\n",
    "When the `page_content` contains only one field the information will be the field value only. Otherwise the `page_content` will be in JSON format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Customize Connection & Authentication"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from google.auth import compute_engine\n",
    "from google.cloud.datastore import Client\n",
    "\n",
    "client = Client(database=\"non-default-db\", creds=compute_engine.Credentials())\n",
    "loader = DatastoreLoader(\n",
    "    source=\"foo\",\n",
    "    client=client,\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
